EN FR
EN FR


Section: New Results

Protein Sequences and Structures

Participants : Rumen Andonov [contact] , Antonio Mucherino, François Coste, Jacques Nicolas, Andres Burgos, Gaëlle Garet, Pavel Senin, Mathilde Le Boudic-Jamin.

  • Branch & Prune Algorithm: We proposed an extension of the Branch & Prune (BP) algorithm for the Discretizable Molecular Distance Geometry Problem (DMDGP) which is able to exploit all symmetries of the research domain of the corresponding combinatorial optimization problem [30]

  • Modeling protein sequences with long distance correlations To initiate this new line of research, we have set up a framework to learn context-free grammars on protein sequences based on the identification of conservation blocks and substitutability of non-terminals. A first implementation of the learning algorithms showed the interest of this approach [38]

  • Maximum Contact Map Overlap (CMO) is a popular measure for quantifying the similarity between protein structures. A new integer programming model was presented for CMO and an exact branch-and-bound algorithm was designed with bounds obtained by a novel Lagrangian relaxation. The efficiency of the approach was demonstrated on known benchmarks on which sets our approach significantly outperforms the best existing exact algorithms [3] .

  • Alignment of protein structures First successes were obtained on provably optimal pairwise alignment of protein inter-residue distance matrices, using the popular Dali scoring function. We proposed the first mathematical model for computing optimal structural alignments based on dense inter-residue distance matrices and present algorithm engineering techniques to handle the huge integer linear programs [22] . In a second paper, a strategy was proposed for sparsifying distance matrices in which only the distances needed for uniquely reconstructing the conformations of the proteins are kept. [31] .

  • Protein Family Identification Identification of protein families is a computational biology challenge that needs efficient and reliable methods. First, we used the comparison tool A_purva, which is based on Contact Map Overlap (CMO), to classify protein structure coming from the CATH database. The obtained results showed that A_purva was able to correctly classify 92% of the structures, and that introducing the notion of dominance drastically reduced the computational time needed for classifying the protein structures [33] . Then, we introduced this concept of dominance in a novel combined approach based on Distance Alignment Search Tool (DAST), which contains an exact algorithm with bounds. Our experiments showed that this method successfully finds the most similar proteins in a set without solving all instances [28] .

  • Local Protein Threading, sequence-structure alignment: A novel approach to PTP has been investigated. It aligns a part of a protein structure onto a protein sequence in order to detect local similarities [8] . [Online publication: http://www.sciencedirect.com/science/article/B6TYW-50G78H4-1/2/947312da7a7bbf175cab7b3288ba4f03]